Development and validation of a prognostic nomogram for predicting cancer-specific survival in lymph node-negative elderly esophageal cancer patients: A SEER-based study

In this study, we explored the prognostic risk factors of elderly patients (≥65 years old) with lymph node-negative esophageal cancer (EC) and established a nomogram to evaluate the cancer-specific survival of patients. The surveillance, epidemiology, and end results database was used to collect data on patients diagnosed with EC. Univariate and multivariate Cox analyses were used to determine independent prognostic factors, and the nomogram for predicting cancer-specific survival of EC patients was constructed based on the independent prognostic factors obtained from the multivariate Cox analysis. To evaluate the predictive ability of the nomogram, calibration curves, concordance index (C-index), receiver operating characteristic curves, and decision curve analysis were conducted. Kaplan–Meier method was used to analyze the long-term outcomes of EC patients with different risk stratifications. A total of 3050 cases with lymph node-negative EC were randomized into the training cohort (1525) and the validation cohort (1525). Cancer-specific mortality at 1, 3, and 5 years in the entire cohort was 30.7%, 41.8%, and 59.2%, respectively. In multivariate Cox analysis, age (P < .001), marital status (P < .001), tumor size (P < .001), Tumor-node-metastasis stage (P < .001), chemotherapy (P = .011), radiotherapy (P < .001), and surgery (P < .001) were independent prognostic factors. The C-index for the training cohort was 0.740 (95% confidence interval [CI]: 0.722–0.758), and the C-index for the validation cohort was 0.738 (95% CI: 0.722–0.754). The calibration curve demonstrated the great calibration ability of the nomogram. Based on the area under the receiver operating characteristic curve, the nomogram demonstrated a higher sensitivity than the tumor-node-metastasis stage. Decision curve analysis showed the good clinical utility of the nomogram. The risk stratification system was established using the Kaplan–Meier curve and verified by the log-rank test (P < .001). The nomogram and risk stratification system can improve the accuracy of prediction to help clinicians identify high-risk patients and make treatment decisions.


Introduction
Esophageal cancer (EC) is one of the leading causes of cancer death worldwide, with the seventh incidence rate and the sixth mortality rate, which means that 1 in 18 patients dying from cancer in 2020 is caused by EC, as the incidence rises in parallel with the average age of the global population, more elderly individuals will be diagnosed with EC in the future. [1][2][3] Numerous studies have also shown that older age is an important risk factor for death in EC patients. [4,5] Therefore, further attention should be paid to the prognosis of the elderly.
According to the National Comprehensive Cancer Network guidelines, surgery-based comprehensive treatment is the mainstream treatment option for lymph node-negative EC. However, for the elderly, it is significant to assess not only the patient tolerance but also the ability to recover from surgical trauma.
Surgery is not the first choice for the part of elderly patients who cannot be successfully operated on due to poor health, and adjuvant therapy is still optional for lymph node-negative EC patients. The relationship between adjuvant therapy and the prognosis of node-negative EC patients has been explored in numerous studies, but the conclusions remain uncertain. [6][7][8][9][10] Furthermore, prognostic factors such as age, sex, and tumor size should be considered when evaluating these patients. Therefore, we aim to achieve a more accurate prognosis analysis by analyzing information on demographic characteristics, clinicopathological characteristics, and treatment methods in the surveillance, epidemiology, and end results Program (SEER) databases.
The tumor-node-metastasis (TNM) stage system is widely recognized and applied in tumor prognosis. However, in recent years, nomograms have shown a better accuracy in the prediction of prognosis than the TNM stage in various types of cancer. [11][12][13][14] Although there have been some nomograms in the EC field, [15][16][17] there have been no relevant studies on node-negative elderly patients with EC. Based on the SEER database, we developed and validated a nomogram that can be used to predict cancer-specific survival (CSS) at 1, 3, and 5 years in node-negative elderly EC patients.

Patient selection
We retrieved data online using the SEER*Stat software Version 8.4.0.1 (http://seer.cancer.gov/seerstat/). In our study, a total of 13,157 patients with T1-4N0M0 stage EC between 2004 and 2015 were retrieved from the "incidence-SEER Research Plus Data, 18 Registries" database. The exclusion criteria were: Patients younger than 65 years old or with unknown age; Nonprimary malignant tumors or patients with unknown tumor status; Patients with no information on race, tumor size, marital status, surgery, radiotherapy, chemotherapy, follow-up data, and TNM stage; Patients with a survival time of <1 month. A total of 3050 patients were included in the study through the above process (Fig. 1). A 5:5 ratio was randomly divided between the training cohort (n = 1525) and the validation cohort (n = 1525).In this study, patient demographic characteristics, including age, sex, race, and marital status; clinicopathological characteristics, including primary tumor site, histology, tumor size, grade, and TNM stage; treatment information, including surgery, radiotherapy, and chemotherapy; and follow-up data, including survival time and survival status were collected through the SERR database. The observational endpoint of the study was CSS, the time between diagnosis and death attributable to EC. Moreover, since age and tumor size are discrete type variables, the optimal cutoff value in the prognostic analysis of EC patients was judged using X-tile (Yale University, version 3.6.1) and thus transformed into categorical variables.

Statistical analysis
Pearson chi-square test was used to analyze the differences in the distribution of categorical variables between the training cohort and the validation cohort. The univariate Cox analysis was used to select the statistically significant variables to include in the multivariate Cox analysis, and the nomogram was established based on the independent prognostic factors obtained from the multivariate Cox analysis. For the validation of the nomogram, the calibration curve was used to evaluate calibration ability, concordance index (C-index) was performed to estimate discrimination ability, receiver operating characteristic (ROC) curve was used to compare the nomogram and the TNM stage system differences in recognition ability. The clinical application was evaluated using decision curve analysis (DCA). To decrease the overfit bias, bootstraps with 1000 resamples were used.
All the statistical analyses of this study were performed in the R (version 4.2.1, http://www.R-project.org). The "tableone" package was used for the chi-square test. The "rio" package was used for data output. The "survival" package and "survminer" package were used for the Kaplan-Meier curve and the log-rank test. The "survival" package, "forestplot" package, and "survminer" package were used for univariate and multivariate Cox analysis, nomogram establishment, and C index calculation. The "rms" package and "survival" package were used for the calibration curve establishment. The "riskRegression" package and "survival" package were used to establish the ROC curve. The "rms" package, "ggDCA" package, "ggprism" package, and "survival" package were used to establish the DCA. P < .05 was considered statistically different.

Prognostic risk stratification
The scores obtained from the nomogram were assigned to each independent prognostic factor and the total score was calculated by adding up the scores of each prognostic variable. The Table 1 Characteristics of the training and validation cohorts (χ 2 test).

Variables
Whole cohort n = 3050 Training cohort n = 1525 Validation cohort n = 1525 P best cutoff value of the total score was determined using X-tile and was divided into 3 categories (low, middle, and high-risk groups). The Kaplan-Meier survival curve and the log-rank test were used to analyze the differences in long-term prognosis for CSS of the 3 subgroups.

Ethical Statement
Publicly available data was obtained from the SEER database (http://seer.cancer.gov/seerstat/) for use in the current study. Therefore, the study exempted institutional review board approval.

Patient characteristics
A total of 3050 patients were enrolled from the SEER database for the analysis, including 1525 patients in the training cohort and 1525 patients in the validation cohort. The demographic, clinicopathological, and treatment information were summarized (

Construction and validation of the nomogram
Based on the results of the multivariate Cox analysis, we established a nomogram (Fig. 2). All the independent prognostic factors include age, marital status, tumor size, TNM stage, surgery, chemotherapy, and radiotherapy. In the training cohort, the C-index of the prognostic nomogram and the TNM stage were 0.740 (95% confidence interval [CI]: 0.722-0.758) and 0.574 (95%CI: 0.554-0.594), respectively. Meanwhile, in the validation cohort, the C-index of the prognostic nomogram and the TNM stage were 0.738 (95%CI: 0.722-0.754) and 0.574 (95%CI: 0.554-0.594), respectively. To verify the calibration capability, we summarized the calibration curves at 1-, 3-, and 5-year intervals for the nomogram (Fig. 3). The calibration curves of the nomogram for both the training cohort and the validation cohort indicate that the nomogram-predicted survival is highly consistent with the actual survival rate. The ROC curves proved that the nomogram established in this study had a higher prediction efficacy than the TNM stage at 1-, 3-, and 5-years (Fig. 4).   indicated that the nomogram has great clinical applicability in predicting 1-, 3-, and 5-year survival (Fig. 5).

Establishment of risk stratification
The best cutoff value of the total score was calculated using X-tile software, and the patients were divided into 3 subgroups: low risk (<135 points), middle risk (136-243 points), and high risk (>244 points). We observed a significant survival difference among the 3 groups of patients according to the log-rank test (P < .001) (Fig. 6).

Discussion
In this study, we established a prognostic nomogram of CSS for predicting lymph node-negative elderly EC patients for 1, 3, and 5 years based on the SEER database, which performed excellently in the internal validation. Age, marital status, tumor size, TNM stage, surgery, chemotherapy, and radiotherapy were considered independent prognostic factors for CSS. However, since only the poorly differentiated patients were statistically significant in the results of the multivariate Cox regression (P = .009), we did not include tumor grade in the establishment of the nomogram. The C-index of both the training cohort and the validation cohort was >0.70, indicating that the nomogram constructed in this study had a satisfactory prediction. Furthermore, the ROC curve showed that the nomogram predicted more efficiently than the existing TNM stage system. We verified the good agreement between the predicted survival probability of the nomogram and the actual survival probability of the patients by using the calibration curves. To accurately evaluate the prognosis of patients, we assigned each independent prognostic variable a score by nomogram and calculated the total patient score. We divided EC patients into 3 groups according to different scores  using X-tile, and the Kaplan-Meier curve showed huge survival differences. With the easy-to-use risk stratification system, clinicians can judge the prognosis of EC patients and take treatment or follow-up measures accordingly. In our study, surgery was the most significant prognosis-influencing factor, which is consistent with the view of previous studies. [18] However, our data show that only 45.3% of the patients underwent surgery, while the number of patients undergoing surgical resection significantly decreased with increasing age. In general, elderly patients tend to suffer from chronic diseases such as hypertension and diabetes, leading surgeons to prefer conservative treatment. Therefore, adjuvant therapy is particularly important for this population. However, the impact of radiotherapy and chemotherapy on the long-term prognosis of perioperative patients is still controversial. Gao et al [19] discovered that compared with surgery alone, cN0 esophageal cancer patients with pathological lymph node-negative or local true lymph node-positive diseases have significant survival benefits from neoadjuvant chemoradiotherapy. In other research, [9] it was found that T3 patients benefit more from surgery plus perioperative chemotherapy, however, perioperative chemotherapy does not present survival benefits to T1-2 patients, and it is an adverse prognostic factor for T1 patients. In this study, we focused not only on those patients who underwent esophagectomy but also on elderly patients who cannot undergo surgery due to various factors such as poor physical condition and oversize tumors. For these patients, chemotherapy (hazard ratio = 0.755; 95%CI: 0.621-0.916; P = .004) and radiotherapy (hazard ratio = 0.601; 95%CI: 0.486-0.743; P < .001) were significant protective factors.
In previous studies, tumor size was not an independent prognostic factor in patients with node-negative EC. [6,8,9,[19][20][21][22] However, to our knowledge, tumor size is closely related to the choice of treatment options. In our data, for patients with a tumor size >59 mm, only 20.4% (153/620) of the patients underwent surgery, however, when the tumor size was <22 mm, 69.2% (666/962) of the patients underwent the surgery, which would result in a completely different patient prognosis. In fact, in our univariate and multivariate analysis, the size of the tumor has a significant prognostic significance, we considered that the reason for this result is we used X-tile software to determine the optimal truncation value for the tumor size cutoff point. In contrast, in previous studies, they only classified patients according to their own experience or randomly grouped them, which led to inaccurate conclusions.
So far, some studies have constructed nomograms to predict the long-term prognosis of EC patients, [23,24] but they have not been widely used in clinical practice. However, most studies analyzed only those EC patients who underwent surgery, ignoring elderly patients who cannot tolerate surgery and thus take conservative treatment. In our cohort, only 45.31% (1382/3050) of patients underwent surgery, while 56.95% (1737/3050) and 51.70% (1557/3050) of patients underwent radiotherapy and chemotherapy, respectively. Among patients without surgery, 74.9% (1250/1688) patients received palliative radiotherapy, and 64.0% (1067/1688) patients received palliative chemotherapy, in our opinion, the prognosis of such a population cannot be ignored. Compared with previous studies, this study has several advantages. Firstly, in this study, we used the X-tile to determine the optimal cutoff value for age, and tumor size, which would increase the precision of prognosis prediction, the predictive efficacy of this nomogram was better than that of Zheng et al [23] (0.740 vs 0.708) and Du et al [24] (0.740 vs 0.710). Secondly, due to the particularity of elderly patients, although surgery is the best choice for patients with negative lymph nodes, the prognosis of those elderly patients who cannot be successfully operated on due to poor health still needs to be considered. Finally, a risk stratification system was established to distinguish patients with different prognoses, enabling high-risk patients to have more opportunities to prolong survival and improve their quality of life.
Although this study established a reliable prediction nomogram, some limitations remain. First, we included adjuvant therapy in the establishment of the nomogram, but there was no specific chemotherapy regimen, radiotherapy method, and dose in the SEER database, leading to some bias in the prediction of EC patients. [25,26] Second, there are no certain risk factors affecting the prognosis of esophageal cancer in the SEER database, such as BMI, postoperative complications, nutritional status, targeted drug therapy, or genetic molecule indicators, which Figure 6. Kaplan-Meier method estimate of cancer-specific survival in the training cohort. Low risk refers to a total score of <135, high risk refers to a total score of >244, and middle risk is between 136 and 243. Medicine contributed to bias. Third, although the SEER database is population-based and the nomogram was internally validated, we did not use external data to confirm the accuracy of the prediction nomogram.
In conclusion, we constructed and validated a nomogram of CSS of node-negative elderly EC patients based on the SEER database, and was well validated, showing higher accuracy than the TNM stage that can be used to identify high-risk patients in clinical practice.